Subtitle-free Movie to Script Alignment
نویسندگان
چکیده
A standard solution for aligning scripts to movies is to use dynamic time warping with the subtitles (Everingham et al., BMVC 2006). We investigate the problem of aligning scripts to TV video/movies in cases where subtitles are not available, e.g. in the case of silent films or for film passages which are non-verbal. To this end we identify a number of “modes of alignment” and train classifiers for each of these. The modes include visual features, such as locations and face recognition, and audio features such as speech. In each case the feature gives some alignment information, but is too noisy when used independently. We show that combining the different features into a single cost function and optimizing this using dynamic programming, leads to a performance superior to each of the individual features. The method is assessed on episodes from the situation comedy Seinfeld, and on Charlie Chaplin and Indian movies.
منابع مشابه
Context-driven automatic bilingual movie subtitle alignment
Movie subtitle alignment is a potentially useful approach for deriving automatically parallel bilingual/multilingual spoken language data for automatic speech translation. In this paper, we consider the movie subtitle alignment task. We propose a distance metric between utterances of different languages based on lexical features derived from bilingual dictionaries. We use the dynamic time warpi...
متن کاملImproved Sentence Alignment for Movie Subtitles
Sentence alignment is an essential step in building a parallel corpus. In this paper a specialized approach for the alignment of movie subtitles based on time overlaps is introduced. It is used for creating an extensive multilingual parallel subtitle corpus currently containing about 21 million aligned sentence fragments in 29 languages. Our alignment approach yields significantly higher accura...
متن کاملUsing Movie Subtitles for Creating a Large-Scale Bilingual Corpora
This paper presents a method for compiling a large-scale bilingual corpus from a database of movie subtitles. To create the corpus, we propose an algorithm based on Gale and Church’s sentence alignment algorithm(1993). However, our algorithm not only relies on character length information, but also uses subtitle-timing information, which is encoded in the subtitle files. Timing is highly correl...
متن کاملSynopsis Alignment: Importing External Text Information for Multi-model Movie Analysis
Text information, which plays important role in news video concept detection, has been ignored in state-of-the-art movie analysis technology. It is so because movie subtitles are speech of roles which do not directly describe content of movie and contributes little to movie analysis. In this paper, we import collaborative-editing synopsis from professional movie sites for movie analysis, which ...
متن کاملHigh-quality bilingual subtitle document alignments with application to spontaneous speech translation
In this paper, we investigate the task of translating spontaneous speech transcriptions by employing aligned movie subtitles in training a statistical machine translator (SMT). In contrast to the lexical-based dynamic time warping (DTW) approaches to bilingual subtitle alignment, we align subtitle documents using time-stamps. We show that subtitle time-stamps in two languages are often approxim...
متن کامل